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Abstract  (cont'd) 

It  illustrates  that  the  natural  constraints  needed  to  solve  a  problem  in  inverse  optics 
can  be  extracted  directly  from  a  sufficient  set  of  input  data  and  the  corresponding 
solutions.  The  learning  procedure  has  been  implemented  as  a  parallel  algorithm  on  the 
Connection  Machine  System. 
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ABSTRACT: 

We  show  that  a  color  algorithm  capable  of  separating  illumination  from 
reflectance  in  a  Mondrian  world  can  be  learned  from  a  set  of  examples. 
The  learned  algorithm  is  equivalent  to  filtering  the  image  data  -  in  which 
reflectance  and  illumination  are  intermixed  -  through  a  center-surround  re¬ 
ceptive  field  in  individual  chromatic  channels.  The  operation  resembles  the 
“retinex”  algorithm  recently  proposed  by  Edwin  Land.  This  result  is  a  spe¬ 
cific  instance  of  our  earlier  result  that  a  standard  regularization  algorithm 
can  be  learned  from  examples.  It  illustrates  that  the  natural  constraints 
needed  to  solve  a  problem  in  inverse  optics  can  be  extracted  directly  from 
a  sufficient  set  of  input  data  and  the  corresponding  solutions.  The  learning 
procedure  has  been  implemented  as  a  parallel  algorithm  on  the  Connection 
Machine  System. 
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Computational  vision  has  derived  effective  solutions  to  early  vision  prob¬ 
lems  such  as  edge  detection,  stereopsis  and  structure  from  motion  by  exploit¬ 
ing  general  constraints  on  the  imaging  process  and  the  natural  world.  An 
important  question  is:  how  does  a  visual  system  learn  the  algorithms  it  uses? 
Can  they  -  and  the  underlying  natural  constraints  -  be  learned  automatically 
from  sets  of  examples?  We  have  explored  these  questions  for  the  computa¬ 
tional  problem  of  color  vision,  and  implemented  a  parallel  learning  procedure 
on  the  Connection  Machine  System. 

Color  constancy  points  to  a  difficult  computation  underlying  human 
color  vision.  We  do  not  merely  discriminate  between  different  wavelengths 
of  light;  we  assign  roughly  constant  colors  to  objects  even  though  the  inte-  - 
sity  signals  they  s<  nd  to  our  e  es  chan.ce  as  the  illui  inar,  >n  varie  acre  s 
space  and  chromatic  spectrum.  Perfect  color  constai  y  w  uld  rest  It  fre  a 
a  computation  that  extracts  the  invariant  spectral  ret  '>ctai  re  prop<  rties  f 
surfaces  from  the  image  intensity  signal,  in  which  reflectance  and  illumina¬ 
tion  are  mixed.  The  fact  that  the  colors  we  see  are  not  exactly  invariant 
suggests  that  our  visual  system  performs  a  computation  with  a  similar  goal, 
but  less  exact  results.  The  computation  is  typical  of  the  difficult  problems  of 
inverse  optics,  in  which  the  information  supplied  by  a  two-dimensional  image 
is  insufficient  by  itself  to  solve  for  a  unique  three-dimensional  scene.  Natural 
constraints  must  be  found  and  applied  to  the  problem  in  order  to  solve  it. 

“Rotinex”  lightness  algorithms,  pioneered  by  Land  (Land,  1959,  19S5, 
1986;  Land  and  McCann.  1971)  and  explored  by  others  (Horn,  1974;  Blake. 
1985;  Hurlbert,  19S6)  illustrate  one  successful  approach  to  the  computation. 
The  retinex  algorithms  assume  that  the  visual  system  performs  the  same 
computation  in  each  of  three  independent  chromatic  channels.  The  algo¬ 
rithms  further  assume  that  in  each  channel,  the  image  intensity  signal,  or 
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more  precisely,  the  image  irradiance ,  s',  is  proportional  to  the  product  of 
the  illumination  intensity  e'  and  the  surface  spectral  reflectance  r'  in  that 
channel: 


s'(x,y)  =  r'(x,y)e'(x,y).  (1) 

This  form  of  the  intensity  equation  makes  the  implicit  assumption  that  the 
irradiance  s'  has  no  specular  components  and  that  the  color  channels  have 
been  chosen  appropriately.  1 

Retinex  algorithms  seek  to  solve  Equation  (1)  for  lightness,  which  is  an 
approximation  to  r'(x ,  y ),  in  each  channel.  The  resulting  triplet  of  lightnesses 
labels  a  constant  color  in  color  space.  To  make  a  solution  possible,  retinex 
algorithms  restrict  their  domain  to  a  world  of  Mondrians,  two-dimensional 
su. faces  covered  with  >atches  of  random  colors  (see  Figure  1).  The  algo- 
rit'ims  then  make  the  •  xplicit  ;tssumptions  th  it  (1)  r'(x ,  y  I  is  uniform  within 
pi  ches  nit  has  harp  discontinuity  s  at  edge:  between  patches  and  that  (2) 
e'{  x,y)  varies  smoothly  across  the  Mondrian. 

Most  retinex  algorithms  first  take  the  logarithm  of  both  sides  of  Equa¬ 
tion  (1),  converting  it  to  a  sum: 


Hx,y)  -  r(r,  y)  +  e(T,y),  (2) 

where  s  =  log  s',  r  =  log  r'  and  e  =  log  e' .  The  two  assumptions  are 
then  exploited  to  break  down  the  sum  into  its  two  components. 

The  most  recent  retinex  algorithm  (Land.  19SG;  employs  an  operator 
that  divides  the  image  irradiance  of  Equation  ( 1 )  at  each  location  by  a 
weighted  average  of  the  irradiance  at  all  locations  in  a  large  surround.  The 
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Figure  1.  A  Mondrian  under  an  illumination  gradient,  generated  by  adding  together 
two  320x320  images:  one  is  the  (log)  reflectance  image,  an  array  of  rectangles 
each  with  a  different,  uniform  grey-level  (see  Figure  2(a));  the  other  is  the  (log) 
illumination  image,  in  which  the  pixel  values  increase  linearly  in  the  same  way 
across  each  row. 


log  of  the  operator’s  result  is  called  lightness.  The  triplets  calculated  by 
the  algorithm  fall  close  to  the  colors  humans  see  when  viewing  a  Mondrian 
under  illuminants  with  strong  spatial  gradients.  The  form  of  this  operator  is 
similar  to  that  derived  in  our  earlier  formal  analysis  of  the  lightness  problem 
(see  Hurlbert,  1980).  The  main  difference  between  the  two  is  that  the  ana¬ 
lytically  derived  operator  takes  the  log  of  irradiance  before  averaging,  and  so 
is  linear  in  the  logs,  whereas  Land’s  algorithm  averages  before  taking  the  log, 
and  so  is  not  linear  in  the  logs.  As  discussed  below,  the  numerical  difference 
between  the  two  results  is  small. 

YVe  set  out  to  see  whether  a  simple  algorithm  could  learn  from  examples 
how  to  extract  reflectance  from  image  irradiance.  and  whether  what  it  would 


learn  would  resemble  one  of  the  above  algorithms.  The  examples  we  use  are 
pairs  of  images:  an  input  image  of  a  Mondrian  under  illumination  that  varies 
smoothly  across  space  and  an  output  array  that  displays  the  reflectance  of  the 
Mondrian  separately  from  the  illumination.  We  then  synthesize  an  operator 
from  the  examples  by  finding  the  Imcar  estimator  that  best  maps  the  input 
into  its  two  output  components,  using  optimal  linear  estimation  techniques. 

For  computational  convenience  we  train  the  operator  on  one-dimensional 
vectors  that  represent  horizontal  scan  lines  across  the  Mondrian  images  (see 
Figure  2).  We  generate  many  different  input  vectors  s  by  adding  together 
different  random  r  and  e  vectors,  according  to  Equation  (2).  Each  vector  r 
represents  a  pattern  of  step  changes  across  space,  corresponding  to  one  row 
of  he  01  tput  reflectai  ,-e  image'  (see  Figure  3a).  Each  vector  e  represents  a 
su  >oth  radient  .cros  space  with  a  random  offset  and  slope,  corresponding 
to  one  r  >w  of  tl  •  out  mt  illuininat  on  image.  (In  our  implementation,  we 
a]  pended  r  to  e  to  create  an  output  vector  twice  as  long  as  the  input.)  We 
then  arrange  the  “training  vectors"  ,s  and  r  as  the  columns  of  two  matrices 
5  and  R ,  respectively.  Our  goal  is  then  to  compute  the  optimal  solution  L 
of 

LS  =  R,  (3) 

where  L  is  a  linear  operator  represented  as  a  matrix. 

It  is  well  known  that  the  solution  of  this  equation  that  is  optimal  in  the 
least  squares  sense  is 

L  =  RS+ ,  (4) 

where  5+  is  the  Moore-Penrose  pseudoinverse  (see,  for  example,  Albert 
1972).  We  compute  L  using  the  technique  of  regularizing  the  pseudoinverse  to 
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Figure  2.  (a)  and  (b)  A  pair  of  one-dimensional  examples  like  those  used  to  train  the 
algorithm,  (a)  shows  the  input  data,  which  is  a  random  Mondrian  reflectance  pat¬ 
tern  superimposed  on  a  linear  illumination  gradient  with  a  random  slope  and  offset. 
Each  example  can  be  thought  of  as  a  horizontal  scan  line  across  a  Mondrian  such 
as  the  one  in  Figure  1  (which  was  generated  by  stacking  similar  one-dimensional 
examples).  Each  example,  320  pixels  in  length,  has  a  different  reflectance  pattern 
arid  a  different  linear  illumination  gradient,  (b)  shows  the  corresponding  output 
solution,  in  which  the  illumination  and  the  reflectance  have  been  separated  and 
concatenated.  We  used  1500  such  pairs  of  input-output  examples  to  train  the  op¬ 
erator  shown  in  Figure  4.  (c)  shows  the  result  obtained  by  the  trained  operator 
when  it  acts  on  the  input  data  (a),  not  part  of  the  training  set.  1:  should  be  com¬ 
pared  with  (b).  This  result  is  fairly  typical:  in  some  cases  the  prediction  is  even 
better,  in  others  it  is  worse. 
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obviate  numerical  stability  problems  (see  Appendix  3).  The  number  of  exam¬ 
ples  we  use  is  significantly  larger  than  the  number  of  elements  in  each  vector 
in  S,  in  order  to  overconstrain  the  problem.  Once  L  is  computed  it  is  tested 
on  new  scan  lines,  generated  in  the  same  way.  The-  pseudoinverse  may  also 
be  computed  by  recursive  techniques  which  improve  its  form  as  more  data 
become  available  (see  Appendix  3).  The  latter  procedure,  although  equiva¬ 
lent  to  our  “one-shot”  computing  technique,  may  seem  intuitively  more  like 
learning. 

We  find  that  the  trained  operator  L,  given  a  new  s  as  input,  recovers 
a  good  approximation  to  the  correct  output  vectors  r  and  e.  Operating 
on  a  two-dimensional  Mondrian,  generated  by  stacking  appropriately  many 
one-dimensional  s  vectors,  L  also  yields  a  satisfactory  approximation  to  the 
correct  output  image  (see  Figure  3b). 

It  seems  that  our  scheme  has  successfully  learned  an  algorithm  that  per¬ 
forms  the  color  computation  correctly  in  a  Mondrian  world.  What  algorithm 
has  been  learned?  What  is  its  relationship  to  the  filters  described  above? 
To  answer  these  questions  we  examine  the  structure  of  the  matrix  L.  We 
expect  that  because  the  operator  should  perforr .  he  same  action  on  each 
point  in  the  image,  i.e.  that  it  should  be  space-invariant,  the  central  part 
of  L  should  be  a  convolution  matrix,  in  which  each  row  is  the  same  as  the 
row  above  but  displaced  by  one  element  to  the  right.  In  the  peripheral  parts 
of  the  matrix,  this  form  will  be  corrupted  by  boundary  effects.  Inspection 
of  the  matrix  and  appropriate  averaging  of  the  relevant  rows  (see  Figure  4) 
confirm  these  expectations.  Like  Land’s  psychophysically  tested  filter,  it  has 
a  narrow  positive  peak  and  a  broad,  shallow,  negative  surround  (see  Figure 
4)  that  extends  beyond  the  range  we  can  observe,  but  not  over  the  entire 
image. 


Figure  4.  (a)  The  (log)  reflectance  image  that  is  one  component  of  t ho  Mondrian 
of  Figure  1.  This  image  represents  the  Mondrian  of  Figure  1  under  uniform  illumi¬ 
nation.  (b)  The  (log)  reflectance  imago  that  the  trained  operator  produces  when 
it  acts  on  the  Mondrian  of  Figure  1.  The  operator  has  been  trained  on  a  set  of 
one-dimensional  examples  different  from  those  used  to  generate  the  Mondrian  of 
Figure  1. 

Our  algorithm  is  not  exactly  identical  with  Land's:  the  filter  of  Figure 
4  subtracts  from  the  value  at  each  point  the  average'  value  of  the  logs  at 
till  points  in  the  field,  rather  than  the  log  of  the  average  value's.  As  men- 
tiemeel  a  hove-,  the'  difference1  between  the-  outputs  of  the  two  tillers  is  small  in 
most  case's,  anel  l>e>th  agree  we'll  with  psychophysiea]  re 'suits  (Land,  persemal 
oommunie'ariein).  We-  have'  explicitly  e-ompare'el  the1  pri  feu  mane-e-  <  »f  Land-- 
■.rheme  e  ill  the  Mi  UH  If  la  11  <  if  Figure*  1  with  a  scheme-  equivalent  til  <  >U1  leal  lit  I ! 
algeiiirhm  (hireling  tin-  log  of  the  image  through  a  fillet  like  that  shown  in 
F  Loire-  4  >.  i  he  result  ing  an  ays  approx  i ma  n  f  h<  ■  ee  >i  p  rt  :  >u  f  ]  >ut  e-  |tia  II  v  w<  !!. 
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Figure  4.  The  learned  filter,  which  extracts  reflectance  from  a  Mondrian  image 
under  a  linear  illumination  gradient.  The  operator  thai  is  learned  is  a  matrix  that 
acts  on  one-dimensional  vectors.  Assuming  that  the  operator  is  space  invariant 
when  boundary  effects  can  he  ignored,  we  estimate  t  lie  shape  of  the  corresponding 
filter  by  summating  the  central  rows  of  the  matrix,  shifting  them  appropriately. 
The  one-dimensional  "receptive  field”  that  results  has  a  sharp  positive  peak  and  a 
shallow  surround  that  extends  beyond  the  range  we  cat;  estimate  reliably,  which  is 
the  range  we  show  here.  Tin*  filter  shown  here  was  learned  Irani  a  set  of  examples 
with  linear  illumination  gradients  (see  Figure  2).  When  logarithmic  illumination 
gradients  are  used,  a  qualitatively'  similar  receptive  field  is  obtained.  In  a  separate 
experiment  wo  used  a  training  set  of  one-dimensional  Mondrians  with  cither  lin¬ 
ear  illumination  gradients  or  slowly  varying  sinusoidal  illumination  with  random 
wavelength,  phase  and  amplitude.  The  resulting  filter  is  shown  in  the  inset.  The 
inhibitory  surround  clearly  decays  back  to  zero. 
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To  investigate  the  way  in  which  the  shape  of  the  learned  operator  varies 
with  the  type  of  illumination  gradient  on  which  it  is  trained,  we  constructed 
a  second  set  of  examples.  In  addition  to  vectors  with  random  linear  illumi¬ 
nation  gradients,  this  set  contained  an  equal  pumber  of  vectors  with  random 
sinusoidally  varying  illumination  components.  The  operator  trained,  on  this 
mixture  of  illumination  types  differs  from  the  operator  trained  on  strictly 
linear  ones  in  the  shape  of  its  surround.  Whereas  the  latter  has  a  broad 
negative  surround  that  remains  virtually  constant  throughout  its  extent,  thq 
new  operator’s  surround  (see  Figure  4b)  has  a  smaller  extent  and  returns 
smoothly  to  zero  from  its  peak  negative  value  in  its  center. 

The  difference  between  the  two  operators  illustrates  an  interesting  fea¬ 
ture  of  the  learning  algorithm:  it  adapts  to,  its  environment.  The  results 
imply  that  the  optimal  operator  for  images  with  strictly  linear  illumination 
gradients  is  one  whose  surround  takes  a  constant  average  ove*  a  range  small*  r 
than  the  entire  image.  On  the  other  hard,  th<  surrou  *d  c.f  'he  optimal  o  - 
erator  for  images  with  smoothly  varying  illumination  adi*  .its  is  a  1  >w-pa  s 
filter  that  separates  the  illumination  from  the  sharply  varying  reflectance. 

Our  learning  procedure  is  motivated  by  our  previous  observation  (Poggio 
and  Hurlbert,  1984;  see  also  Poggio  et  al.,  1985b)  that  standard  regulariza¬ 
tion  algorithms  in  early  vision  define  linear  mappings  between  input  and 
output  and  therefore  can  be  learned  associatively  under  certain  conditions 
(see  Appendix  3).  Our  algorithm  synthesizes  the  optimal  linear  operator  L 
that  maps  as  closely  as  possible,  in  the  least  squares  sense,  the  image  irradi- 
ance  into  its  reflectance  and  illumination  components  for  this  class  of  images 
and  illumination  gradients.  The  technique  of  optimal  linear  estimation  that 
it  uses  is  closely  related  to  optimal  Bayesian  estimation  (see  Albert,  1972 
and  Appendix  4). 
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Figure  5.  In  a  separate  experiment  we  trained  the  operator  on  a  set  of  1500  one- 
dimensional  Mondrians  with  either  linear  gradients  of  illumination  or  sinusoidally 
varying  illumination  with  random  wavelengths  between  0.5  and  1.5  times  the  length 
of  the  pattern,  and  random  phases  and  amplitudes.  We  show  here  how  the  operator 
performs  on  a  new  input  pattern  with  a  linear  illumination  gradient  (left)  and  on  a 
pattern  with  a  sinusoidal  illumination  (right).  As  in  Figure  2,  (a)  shows  the  input 
data.  (!>)  shows  the  corresponding  output  solution,  in  which  the  illumination  and 
the  reflectance  have  been  separated  and  concatenated.  (  )  shows  the  result  the 
trained  operator  yields  when  it  acts  on  the  input  data  (a),  not  part  of  the  training 
set.  It  should  be  compared  with  (b).  The  filter  corresponding  to  this  operator  is 
shown  in  the  inset  of  Figure  f. 

The  learned  L  acts  on  new  input  images  to  yi  Id  good  approximations 


to  the  correct  output  images.  Note  that  the  result  of  applying  the  matrix  L 
to  a  two-dimensional  Mondrian  image  (see  Figure  3)  is  not  as  good  as  the 
one-dimensional  examples  suggest.  The  reason  is  that  the  matrix  L  has  been 
learned  from  one-dimensional  examples,  and  in  the  case  of  Figure  3  it  has 
been  applied  to  each  row  of  the  Mondrian  independently.  Small  errors  in 
offsets  and  scaling  factors  for  each  row  that  would  only  slightly  affect  the 
one-dimensional  result  become  more  obvious  as  they  vary  from  row  to  row 
in  the  two-dimensional  case. 

Training  of  the  operator  on  two-dimensional  examples  is  possible,  but 
computationally  very  expensive  if  done  in  the  same  way.  The  present  com¬ 
puter  simulations  require  several  hours  when  run  on  standard  serial  com¬ 
puters  (up  to  20  hours  on  a  Symbolics  3040  for  512x512  images).  The  two- 
dimensional  case  will  need  muca  more  time.  We  expec  that  it  will  be  fea."  - 
ble  only  on  the  latest  version  of  the  Com  ectiou  Machine  (tl  e  65K-p  ocess  r 
CM-2  with  floating  point  hardware)  of  Tl  inking  Machi  ies  C  >rporati<  >n.  Or 
one-dimensional  learning  scheme  runs  orders  of  magnitude  faster  on  a  CM- 
1  Connection  Machine  System  with  16Iv-processors.  It  is  possible  to  use 
much  more  efficient  methods  of  computing  the  pseudoinverse  and  especially 
approximations  to  it  (see  for  example  Albert,  1972  and  Ivohonen,  1977). 

The  calculation  of  a  regularized  pseudoinverse  may  also  be  implemented 
by  parallel  networks  of  simple  processors  or  by  analog  networks  that  bear 
some  resemblance  to  biological  systems.  In  particular,  it  could  be  computed 
by  so-called  “neural”  networks  using  a  gradient  descent  method  (also  called 
back-propagation  in  recent  papers,  see  Rumelhart  et  al.,  1986)  and  linear 
units.  Since  the  pseudoinverse  is  the  best  linear  approximation  in  the  £2 
norm,  gradient  descent  minimizing  the  square  error  between  the  actual  out¬ 
put  and  desired  output  of  a  fully  connected  linear  network  is  guaranteed  to 
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converge,  albeit  slowly.  Thus  gradient  descent  in  weight  space  would  give  the 
same  result  we  obtain,  the  global  minimum.  In  terms  of  a  network  of  linear 
units,  the  training  scheme  we  have  run,  using  one-dimensional  Mondrians 
512  pixels  long,  corresponds  to  learning  the  equivalent  of  up  to  about  half  a 
million  weights. 

The  significance  of  our  result  lies  not  so  much  in  the  specific  estimation 
technique  we  used,  but  in  the  form  of  the  filter  we  obtain.  It  is  qualitatively 
the  same  as  that  which  results  from  the  direct  application  of  regularization 
methods  exploiting  the  spatial  constraints  on  reflectance  and  illumination 
described  above  (see  Table  1  in  Poggio  et  al.,  1985a;  Poggio  and  staff,  1985b; 
Appendix  2).  The  Fourier  transform  of  the  filter  of  Figure  4  is  approxi 
m  1 1 ely  a  bandpass  filter  that  cuts  out  low  frequencies  due  to  slow  gradients 
of  illumination  and  preserves  intermediate  frequencies  due  to  step  changes 
in  reflectance.  The  large  “inhibitory"  surround  also  provides  normalization 
to  average  grey  in  the  field  (see  Hurlbert,  19S6). 

We  do  not  think  that  our  results  mean  that  color  constancy  may  be 
learned  during  a  critical  period  by  biological  organisms.  It  seems  more  rea¬ 
sonable  to  consider  them  simply  as  a  demons tr-o  ion  on  a  toy  world  that 
evolution  may  recover  and  exploit  natural  constraints  hidden  in  the  physics 
of  the  world.  The  shape  of  the  filter  in  Figure  4  is  suggestive  of  the  “non- 
classical"  receptive  fields  that  have  been  found  in  Y4,  a  cortical  area  im¬ 
plicated  in  mechanisms  underlying  color  constancy  (Desimone  et.  ah,  19S5: 
Wild  et  ah,  19S5;  Zeki.  19S3a.b). 

Finally,  it  is  important  to  stress  that  the  general  solution  of  the  problem 
of  color  constancy  requires  much  more  work:  real  images  arc  noisy,  objects 
are  three-dimensional,  and  there  are  shading,  shadows,  and  specularities.  We 


are  presently  extending  the  simple  learning  technique  described  here  in  order 
to  deal  with  the  full  complexity  of  real  scenes. 


Appendix  1 
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The  Intensity  Equation 


This  appendix  illustrates  the  transformation  of  the  photoreceptor  activ¬ 
ity  necessary  in  order  to  decompose  the  intensity  equation  into  a  sum  of  two 
components  representing  the  surface  reflectance  and  the  source  illumination 
intensity. 

In  most  natural  scenes,  the  light  reflected  from  objects  includes  both 
diffuse  and  specular  components.  To  simplify  the  intensity  equation,  we 
assume  that  all  reflection  is  Lambertian,  or  that  there  are  no  specularities.  In 
this  case,  the  light  reflected  by  a  surface  is  solely  the  product  of  its  spectral 
reflectance  and  the  illumination  intensity  that  falls  on  it.  The  amount  of 
reflected  light  that  reaches  the  eye  further  depends  on  the  angles  between  the 
illumination  source,  the  reflecting  surface,  and  the  eve,  and  the  response  of 
th  ■  eye  to  the  light  depends  on  the  pectral  sensitivity  of  its  photoreceptors. 
We  ma\  therefoie  write  the  intensity  signal  registered  by  the  eye  as: 


•5* 


s  V)  —  l°9  J  «"( A  )/•'(  A.  ,r  )r'(  A.  .r  )</,\, 


(.41.1) 


where  v  labels  the  spectral  type  of  the  photoreceptor  (u  —  1.  ...4  for  humans, 
counting  3  cone  types  and  1  rod  type).  '''"(A)  is  the  spectral  sensitivity  of 
the  nth-type  photoreceptor  and  s'l/(.r)  its  activity.  /'( A,r)  is  the  surface 
spectral  reflectance  and  r'(A,.r)  the  cffr.ctive  irradtan< c.  We  group  together 
the  geometric  factors  influencing  the  intensity  signal  by  defining  the  effective 
irradiance  as  the  ambient  illumination  modified  by  the  orientation,  shape. 


V //vy 


;uk1  location  of  tin.*  reflecting  surface*.  The  effective  irradiance  is  the  intensity 
that  the  surface  would  reflect  if  it  were  white,  that  is,  if  r'(A,x)  =  1. 


Equation  A  1.1  is  not  separable  into  a  product  of  reflectance  and  illumi¬ 
nation  components  in  a  single  color  channel.  To  make  it  separable,  we  must 
make  a  transformation  to  new  color  channels  that  are  combinations  of  the 
photoreceptor  activities.  We  first  choose  basis  functions  p'( A)  and  <?‘(A)  such 
that  for  most  naturally-occurring  illuminants  and  surface  reflectances 


e'(A,x)  «  A)e'l(x) 

J 


(•41.2) 


The  basis  transformation  (for  a  review  of  the  origins  of  this  idea  see 
Maloney,  1985;  see  also  Buchsbaum,  1980  and  Yuille,  1984)  leads  to  the 
following  equation 


*s  ^(x)  =  Tfjt  '(x )r  }(x), 


(-41.3), 


where  the  tensor  T  is  defined  as 


rt/  =z  I 

'-J  1 


d\a‘,(\)p'(\)qJ(\), 


where  v  =  1.  ...4  and  /.  j  =  1,  ...TV,  the  p's  and  the  q's  are  the  basis  functions 
for  the  illuminant  and  for  the  albedo,  respectively,  and  the  sum  is  taken  over 
repeated  indices. 

To  simplify  further  analysis,  we  impose  the  conditions  that  the  p‘  =  q' 
and  that  the  //(  A)  are  orthogonal  with  respect  to  the  au(\).  This  orthogonal¬ 
ity  is  insured  if,  for  example,  the  p'(  A)  do  not  overlap  with  respect  to  A.  In  the 
simplest  case,  the  basis  functions  mav  be  ,iono<  hroinat  •:  v'  A)  =  — A. 
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S\N‘ 


Substituting  A1.2  into  Al.l  then  yields: 


s  v(x)  =  log  y  e  '(r)r  '(j)  JdXa^p'WiX)  (-41.4) 

I 

If  we  define  the  matrix  Tf  —  f  dXat,(X)p,(X)p'(X),  where  u  =  1,...4 
and  i  —  1,  ...N,  we  obtain 


i 


antilog  s  "(x)J  =  T^.e  *( x)r  '(x)  (-41.5) 


If  the  p'(X)  are  suitably  chosen,  T„,  is  invertible.  Then  the  linear  equa¬ 
tions  represented  in  A1.5  yield  the  following  solution: 


# 


is 


m 


« 

F 


(T„i)  1  antilog 


s  *V)  =  e‘(^)r‘(j) 


(-41.G) 


or, 


s  ’(x)  =  t  '(x)r  '( x) 


(-41.7) 


where  s  '(x)  =  (T,,, )  lantilog 


’(*) 


.  Taking  logarithms  of  Al.7  yields 


or 


log  s  '(x)  =  log  e  '(./•)  +  log  r  *(x) 


s'(x)  =  t’(j-)  +  r'(r)  i  =  1 . V. 


(-4 1-S) 


where  .s'(.r)  =  log  ,s  '(.r)  and  so  on.  which  is  the  desired  equation.  The 

extension  to  the  two-dimensional  case  is  clear. 

s‘(.r  )  is  a  linear  combination  of  the  activity  of  different  types  of  photore¬ 
ceptors.  It  is  important  to  note  that  the  index  i  labels  not  the  color  channels 
associated  with  the  spectral  sensitivities  of  the  diff<  lent  photoreceptor  types 


but  new  channels  which  may  be  similar  to  the  biological  color-opponent  chan¬ 
nels.  There  is  no  a  prion  limit  on  the  number  of  new  channels  formed  by 
linear  combinations,  but  efficiency  of  information  transmission  would  require 
it  to  be  close  to  the  number  of  photoreceptor  types. 


Appendix  2 

A  Regularization  Algorithm  for  Color  Computation 


2.1  The  regularization  approach 

Consider  the  equation 


V  =  Az  (A  2.1) 

where  A  is  a  known  operator.  The  direct  problem  is  to  determine  y  given  z. 
The  inverse  problem  is  to  find  z,  given  y  -  the  function  y  becomes  the  “data” 
and  z  the  solution.  Although  the  direct  problem  is  usually  well-posed,  the 
inverse  problem  is  often  ill-posed. 

Standard  regularization  theories  for  “solving”  ill-posed  problems  have 
been  developed  by  Tikhonov  (Tikhonov  and  Arsenin,  1977)  and  others.  The 
basic  idea  underlying  regularization  techniques  is  to  restrict  the  space  of 
acceptable  solutions  by  choosing  the  one  solution  that  minimizes  an  appro¬ 
priate  functional.  Among  the  methods  that  can  be  employed  (see  Poggio.  et. 
al.  1985a),  the  main  one  is  to  find  z  that  minimizes 

\\Az  -  y\\2  +  \\\Pz\\2.  (A2.2) 

The  choice  of  the  norm  |j  ■  ||,  usually  quadratic  as  in  Equation  A2.2,  and  of 
the  linear  stabilizing  functional  ||.P-||,  is  dictated  by  mathematical  considera¬ 
tions,  and  most  importantly,  by  a  physical  analysis  of  the  generic  constraints 
on  the  problem.  The  regularization  parameter  A  controls  the  compromise 
between  the  degree  of  regularization  of  the  solution  and  its  closeness  to  the 


2.2  Regularizing  the  Color  Computation 

Equation  Al.8  is  impossible  to  solve  in  the  absence  of  additional  con¬ 
straints:  there  are  twice  as  many  unknowns  as  equations  for  each  x.  Formally, 
the  problem  is  ill-posed.  Regularization  techniques  can  be  used  to  restrict 
the  number  of  allowed  solutions  for  e‘(x)  and  r'(x),  and  thereby  reduce  the 
number  of  unknowns,  by  imposing  constraints  on  the  imaging  process  and 
the  physical  properties  of  the  surfaces  and  the  source  illuminant. 

One  constraint  that  may  be  used  is  spatial  regularization  (for  other  con¬ 
straints  -  spectral  regularization  and  the  single  source  assumption  -  see  Hurl- 
bert,  2001  and  Poggio,  1985b).  The  spatial  regularization  constraint  formal¬ 
izes  and  extends  the  retinex  assumptions  that  (a)  r'(x)  is  either  constant  or 
changes  sharply  at  boundaries  between  different  materials,  and  (b)  e'(x)  is  ei¬ 
ther  constant  or  changes  more  smoothly  than  r'(x)  across  space.  One  retinex 
algorithm  (Horn,  1974),  for  example,  imposes  the  strong  constraint  (in  two- 
dimensions)  that  all  values  of  V2s*(x,  y)  strictly  below  a  fixed  threshold  are 
due  to  e'(x,y).  The  regularization  assumption  requires  only  that  e*(x)  vary 
less  sharply  across  space  than  r*(x)  and  effectively  allows  the  limit  on  the 
spatial  variation  of  e‘(x)  to  be  reset  for  each  scene. 

Standard  regularization  techniques  enforce  this  constraint  on  Equation 
Al.8  by  requiring  that  its  solution  minimize  the  following  variational  princi¬ 
ple: 

E(s'  -  (’■'  +  C)]2  +  +  /i[G  .  r'  +  l  (.42.3) 

where  G  is  a  gaussian  filler  wit  i  standard  deviation  .  lie  first  t-  rm  r<  - 
quires  that  the  solut  on  (r*  +  e')  be  close  to  s'\  the  second  term  enforces  the 
constraint  that  the  illumination  vary  smoothly  across  space;  and  the  third 
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m 
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t<  m  en  orces  th  con:  raint  that  th<  reflect, a  :ce  vary  not  t  oo  smootl  ly  across 
S]  ice. 

Minimizing  Equation  A‘2.3  demonstrates  that  the  solutions  r '  and  e1 
may  be  obtained  by  filtering  sl  through  a  linear  filter.  In  the  Fourier  domain 
we  derive  the  result: 


r'(«)  = 


(1  +  /3e-“3"2  +  /37u;2  +  +  /37u>4)(1  +  Au>2)  -  1 


(A2A) 


Note  that  the  quadratic  variational  principle  of  standard  regularization 
cannot  enforce  the  spatial  regularization  constraint  with  full  generality.  A 
more  general  regularization  scheme  based  on  Markov  random  fields,  which 
leads  to  standard  regularization  as  a  special  case,  is  sketched  by  Poggio  and 
staff  (1985). 


*  t  i 


Appendix  3 

Learning  a  Regularization  Algorithm 


3.1  Associative  Learning  of  Standard  Regularizing  Operators 

Minimization  of  the  regularization  principle  A2.2  corresponds  to  deter¬ 
mining  a  regularizing  operator  that  acts  on  the  input  data  y  and  produces 
as  an  output  the  regularized  solution  2.  Suppose  now  that  instead  of  solv¬ 
ing  for  z ,  we  are  given  y  and  its  regularized  solution  z  and  we  want  to  find 
the  operator  that  effects  the  transformation  between  them.  This  appendix 
demonstrates  that  the  regularizing  operator  can  be  synthesized  by  associa¬ 
tive  learning  from  a  set  of  examples.  The  argument  consists  of  two  claims, 
explored  in  detail  below.  The  first  claim  is  that  the  regularizing  operator 
corresponding  to  a  quadratic  variational  principle  is  linear.  The  second  is 
that  any  linear  mapping  between  two  vector  spaces  can  be  synthesized  by 
an  associative  scheme  based  on  the  computation  of  the  pseudoinverse  of  the 
data. 

3.1.1  Linearity  of  the  regularized  solution 

To  show  that  variational  principles  of  the  form  of  Equation  A2.2  lead 
to  a  regularized  solution  that  is  a  linear  transformation  of  the  data,  we  start 
with  the  discrete  form  of  Equation  A2.2: 

||A=-j,||2  +  A||P2||2,  (.43,1) 

in  which  r  and  y  are  vectors  and  A  and  the  Tikhonov  stabilizer  P  are  ma¬ 
trices,  A  does  not  depend  on  the  data,  and  )j  ■  )|  is  a  norm. 

The  minimum  of  this  functional  will  occur  at  its  unique  stationary  point 
2.  To  find  2,  we  set  to  zero  the  gradient  with  respect  to  z  of  Equation 


A3.1.  The  solution  to  the  resulting  Euler-Lagrange  equations  is  the  minimum 


v  ctor  2: 


{AT  A  +  \PT  P)z  =  ATy. 


(.43.2) 


It  follows  that  the  solution  z  is  a  linear  transformation  of  the  data  y: 


=  Ly, 


(.43.3) 


where  L  is  the  linear  regularizing  operator.  (If  the  problem  were  well-posed, 
the  operator  L  would  equal  simply  the  inverse  of  A.)  It  is  important  to  note 
that  L  may  depend  on  the  given  lattice  of  data  points. 

3.1.2  Learning  a  linear  mapping 

Given  that  the  mapping  between  a  set  of  input  vectors  y  and  their 
regularized  solutions  2  is  linear,  how  do  we  solve  for  it?  We  start  by  arranging 
the  sets  of  vectors  in  two  matrices  Y  and  Z.  The  problem  of  synthesizing  the 
regularizing  operator  L  is  then  equivalent  to  “so’ving”  the  following  equation 
for  L: 


Z  =  LY 


A  general  solution  to  this  problem  is  given  by 


L  =  ZF+, 


(.43.4) 


(A3. 5) 


where  K+  is  the  pseudoinverse  of  T.  This  is  the  solution  which  is  most  robust 
against  errors,  if  Equation  A3. 4  admits  several  solutions  and  it  is  the  optimal 
solution  in  the  least-squares  sense,  if  no  exact  solution  of  Equation  A3. 4 
exists.  This  latter  case  is  the  one  of  interest  to  us:  in  order  to  overconstrain 
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the  problem,  and  so  avoid  look-up  table  solutions,  we  require  that  the  number 
of  examples  (columns  of  K)  be  larger  than  the  rank  of  the  matrix  L.  In  this 
case,  there  is  no  exact  solution  of  Equation  A3. 4  and  the  matrix  L  is  chosen 
instead  to  minimize  the  expression 


M  =  || LY  -  Zf.  (A3. 6) 

L  may  be  computed  directly  by  minimizing  A3. 6,  which  yields 

£=--  ZYT(YYTr'  ( .43.7 ) 

In  practice,  we  compute  L  using  Equation  A3. 7,  but  first  regularize  it 
by  adding  a  stabilizing  functional  to  obviate  problems  )f  numerical  stability 
(Tikhonov  and  Ars«  nin,  1077). 

These  results  show  that  the  standard  regularizing  operator  L  (parame¬ 
trized  by  the  lattice  of  data  points)  can  be  synthesized  without  need  of  an 
explicit  variational  principle,  if  a  sufficient  set  of  correct  input-output  pairs  is 
available  to  the  system.  Note  that  by  supplying  as  examples  the  physically 
correct  solutions  z,  we  assume  that  they  are  identical  to  the  regularized 
solutions  z,  and  enforce  both  regularization  and  correctness  on  the  linear 
operator  we  obtain. 

3.2  Recursive  estimation  of  L 

It  is  of  particular  import  for  practical  applications  that  the  pseudoinverse 
can  be  computed  in  an  adaptive  way  by  updating  it  when  new  data  become 
available  (Albert,  1972).  Consider  again  Equation  A3. 7.  Assume  that  the 
matrix  Y  consists  of  n  —  1  input  vectors  and  Z  of  the  corresponding  correct 
outputs.  We  rewrite  Equation  A3. 7  as 


If  another  input-output  pair  yn  and  zn  becomes  available,  we  can  compute 


L„  recursively  by 


L  n  —  Lji  —  i  T  ( z  ii  L  n  _  ]  y  n )  1 n , 


(.43.9) 


where 


r  vWn- 
n  I  +  yUYn-iY^-W 


(.43.10) 


provided  that  (V'„_1y'n7l1  )_1  exists  \i.e..  that  the  number  of  columns  in  Y  is 
greater  than  or  equal  to  the  dimension  of  y).  The  case  in  which  {Yn-\Y,^_l)~l 
does  not  exist  is  discussed  together  with  more  general  results  in  Albert 
(1972).  Note  that  (^n  —  in  the  updating  Equation  A3. 9  is  the  error 


b< ’Tween  the  desired  output  and  the  predicted  one.  in  terms  of  the  current 
L  The  coefficient  t„  is  the  weight  of  the  correction:  with  the  value  given 


by  Equation  .43.10  the  correction  is  optimal  and  cannot  be  improved  by  any 


iteration  without  new  data.  A  different  value  of  the  coefficient  is  suboptimal 


but  may  be  used  to  converge  to  the  optimal  solution  by  successive  iterations 


of  Equation  A3. 10  using  the  same  data. 


mmm 
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Appendix  4 

Optimal  Linear  Estimation, 
Regression  and  Bayesian  Estimation 


The  optimal  linear  estimation  scheme  we  have  described  is  closely  related 
to  a  special  case  of  Bayesian  estimation  in  which  the  best  linear  unbiased 
estimator  (BLUE)  is  found.  Consider  Equation  A3. 3:  the  problem  is  to 
construct  an  operator  L  that  provides  the  best  estimation  Ly  of  2.  We  assume 
that  the  vectors  y  and  2  are  sample  sequences  of  gaussian  stochastic  processes 
with,  for  simplicity,  zero  mean.  Under  these  conditions  the  processes  are  fully 
specified  by  their  correlation  functions 


tv 


£()/!/T]  =  C„,  f.'Nr]  =  C„  (-44.1) 

where  E  indicates  the  expected  value.  The  BLUE  of  z  (see  Albert,  1972)  is, 
given  y. 


_  r<  /"-l„ 

Z  —  C  yy  Vi 

which  is  to  be  compared  with  the  regression  equation 


(.44.2) 


Ly  =  ZYr(YYT)-ly. 


(.44.3) 


The  quantities  ZY  and  YY  are  approximations  to  C.y  and  Cyy,  re¬ 
spectively,  since  the  quantities  are  estimated  over  a  finite  number  of  observa¬ 
tions  (the  training  examples).  Thus  there  is  a  direct  relation  between  BLUEs 
and  optimal  linear  estimation.  The  learned  operator  captures  the  stochastic 
regularities  of  the  input  and  output  signals.  Note  that  if  th  -  input  vector, 
y  are  orthonormal,  then  L  =  ZV7  and  the  problem  reduce-  o  const  met  in 
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a  simple  correlation  memory  of  the  holographic  tyj>e  (see  Poggio.  1975a, b). 
Under  no  restrictions  on  the  vectors  ij,  the  correlation  matrix  Z\  1  may 
still  be  considered  as  a  low-order  approximation  to  the  optimal  operator  (see 
Kohonen,  1977). 
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